Skip to content

Conversation

@sitaowang1998
Copy link
Contributor

@sitaowang1998 sitaowang1998 commented Oct 23, 2025

Note

This PR depends on #1423

Description

This PR enables control of whether to run instrument inside configuration by

  • Adding a enable_profiling field in QueryWorker model
  • Adding a CLP_QUERY_WORKER_ENABLE_PROFILING entry inside the EnvDict for query worker.
  • Adding a CLP_ENABLE_PROFILING in the docker files' query_worker service.
  • Adding environment variable reading inside profiling setup.

This approach is extensible when we want to add fine-grained instrument control for different components.

Warning

This PR is a breaking change as it adds new fields in configuration. However, if the user does not add this entry in the config file, the instrument is by default off.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Profiling results are available when the enable_profiling is set to true.
  • Profiling results are not available when the enable_profiling is not set or set to false.

Summary by CodeRabbit

  • New Features

    • Optional runtime profiling for query workflows (extract/search tasks and key scheduler operations), producing interactive flame‑graphs and textual call‑tree reports.
  • Chores

    • Added profiling dependency.
    • Added configuration flag and an environment‑variable toggle (off by default) to enable/disable profiling in deployments.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner October 23, 2025 21:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Walkthrough

Adds a pyinstrument-based profiling utility and config flag, propagates profiling environment variables to the query worker, adds the pyinstrument dependency, and instruments selected Celery tasks and scheduler functions with a profiling decorator.

Changes

Cohort / File(s) Summary
Config field
components/clp-py-utils/clp_py_utils/clp_config.py
Added enable_profiling: bool = False to QueryWorker config.
Dependency
components/clp-py-utils/pyproject.toml
Added pyinstrument>=5.1.1 dependency.
Deployment env
tools/deployment/package/docker-compose.yaml
Added CLP_ENABLE_PROFILING environment variable (default false) to the query-worker service.
Controller env propagation
components/clp-package-utils/clp_package_utils/controller.py
Sets CLP_QUERY_WORKER_ENABLE_PROFILING from clp_config.query_worker.enable_profiling (lowercased string) when configuring the query worker environment.
Profiling utility
components/clp-py-utils/clp_py_utils/profiling_utils.py
New profile(...) decorator supporting sync/async functions, context extraction (job_id/task_id via param or dotted attribute), CLP_ENABLE_PROFILING gate, nested-profile skipping, and saving HTML/TXT outputs under logs dir.
Task instrumentation
components/job-orchestration/.../executor/query/extract_stream_task.py, components/job-orchestration/.../executor/query/fs_search_task.py
Imported and applied @profile(section_name=...) to extract_stream and search Celery tasks.
Scheduler instrumentation
components/job-orchestration/.../scheduler/query/query_scheduler.py
Applied @profile(...) to dispatch_query_job, acquire_reducer_for_job, and handle_finished_search_job with job_id_param="job.id".

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Decorator as "Profile\nDecorator"
    participant Profiler as "Pyinstrument\nProfiler"
    participant Target as "Instrumented\nFunction"
    participant Storage as "Profiles\nDirectory"

    Caller->>Decorator: call decorated function
    activate Decorator
    Decorator->>Decorator: check CLP_ENABLE_PROFILING & parent profiler
    alt profiling disabled or nested
        Decorator->>Target: invoke directly
        Target-->>Decorator: result
    else profiling enabled
        Decorator->>Decorator: extract context (job_id, task_id)
        Decorator->>Profiler: start
        activate Profiler
        Decorator->>Target: invoke (sync or await async)
        Target-->>Decorator: result
        Profiler-->>Decorator: stop & render (HTML + TXT)
        Decorator->>Storage: save files (timestamp + context)
        Storage-->>Decorator: saved
        Decorator->>Decorator: log summary
        deactivate Profiler
    end
    Decorator-->>Caller: return result
    deactivate Decorator
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review profiling_utils.py for correctness around async handling, context extraction (dotted attribute access), and safe behaviour when pyinstrument is absent or when nested profiling occurs.
  • Validate controller env var name and value formatting (CLP_QUERY_WORKER_ENABLE_PROFILING lowercasing).
  • Confirm instrumentation decorators on Celery tasks/scheduler do not change task signatures or break task registration.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add profile setting inside CLPConfig" directly aligns with the primary objective stated in the PR description, which is to "introduce configuration control for profiling in the query worker" by adding an enable_profiling field to the QueryWorker model within CLPConfig. The title is specific, concise, and accurately describes the foundational configuration change. While the PR encompasses additional changes such as the profiling utilities module and decorator applications across multiple files, these are supporting implementations of the core configuration control mechanism. The title effectively communicates the main configuration contribution without unnecessary noise or vague terminology.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 344b018 and 1d9a7be.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/controller.py (1)

377-382: LGTM! Profiling configuration properly integrated.

The environment variable setup correctly converts the boolean config value to a lowercase string format and follows the established naming conventions.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6abc934 and 555592f.

⛔ Files ignored due to path filters (1)
  • components/clp-py-utils/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • components/clp-package-utils/clp_package_utils/controller.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (1 hunks)
  • components/clp-py-utils/clp_py_utils/profiling_utils.py (1 hunks)
  • components/clp-py-utils/pyproject.toml (1 hunks)
  • components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (4 hunks)
  • tools/deployment/package/docker-compose.yaml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (30-118)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (30-118)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_logging.py (1)
  • get_logger (18-26)
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (30-118)
🪛 Ruff (0.14.1)
components/clp-py-utils/clp_py_utils/profiling_utils.py

56-56: Dynamically typed expressions (typing.Any) are disallowed in *args

(ANN401)


56-56: Dynamically typed expressions (typing.Any) are disallowed in **kwargs

(ANN401)


56-56: Dynamically typed expressions (typing.Any) are disallowed in async_wrapper

(ANN401)


72-73: Logging statement uses f-string

(G004)


88-88: Dynamically typed expressions (typing.Any) are disallowed in *args

(ANN401)


88-88: Dynamically typed expressions (typing.Any) are disallowed in **kwargs

(ANN401)


88-88: Dynamically typed expressions (typing.Any) are disallowed in sync_wrapper

(ANN401)


104-104: Logging statement uses f-string

(G004)


180-180: Do not catch blind exception: Exception

(BLE001)


181-181: Logging statement uses f-string

(G004)


213-213: Logging statement uses f-string

(G004)


219-219: datetime.datetime.now() called without a tz argument

(DTZ005)


243-243: f-string without any placeholders

Remove extraneous f prefix

(F541)


254-256: Logging statement uses f-string

(G004)


260-260: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)


260-260: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)

15-15: Decorator order and usage look good.

@Profile placed under @app.task ensures Celery wraps the profiled wrapper; defaults align with job_id/task_id.

Also applies to: 170-171

components/clp-package-utils/clp_package_utils/controller.py (1)

375-380: LGTM! Profiling configuration properly integrated.

The environment variable setup follows the established pattern in this file and correctly converts the boolean configuration value to a lowercase string suitable for environment variable consumption.

components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (1)

17-17: LGTM! Profiling decorator correctly applied to Celery task.

The import and decorator placement are correct. The extract_stream function's job_id and task_id parameters match the profiling decorator's defaults, enabling automatic context extraction for profile output files.

Also applies to: 184-184

components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (4)

49-49: LGTM! Profiling utility imported for scheduler instrumentation.


548-548: LGTM! Profiling decorator correctly configured for job dispatch.

The decorator uses dot notation (job_id_param="job.id") to extract the job ID from the QueryJob parameter, which is explicitly supported by the profiling utility. The absence of a task_id parameter is acceptable for scheduler-level profiling.


570-570: LGTM! Async reducer acquisition properly instrumented.

The profiling decorator correctly handles async functions and will capture the time spent acquiring reducer resources, providing valuable insights into job orchestration overhead.


899-899: LGTM! Search job completion handling properly instrumented.

The decorator configuration is consistent with other scheduler functions and will provide visibility into job finalization overhead, including reducer communication and database updates.

Comment on lines 272 to 275
class QueryWorker(BaseModel):
logging_level: LoggingLevel = "INFO"
enable_profiling: bool = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Add field description for clarity and ensure env plumbing is present.

Document the new config knob.

 class QueryWorker(BaseModel):
     logging_level: LoggingLevel = "INFO"
-    enable_profiling: bool = False
+    enable_profiling: bool = Field(
+        False,
+        description="Enable pyinstrument runtime profiling for query worker tasks."
+    )

Run to confirm the plumbing from config -> env -> container:


🏁 Script executed:

#!/bin/bash
# Verify config->env mapping for profiling
rg -nC2 -e 'CLP_QUERY_WORKER_ENABLE_PROFILING|CLP_ENABLE_PROFILING|enable_profiling' \
  --glob '!**/dist/**' --glob '!**/build/**'

Length of output: 2231


Add field description for documentation clarity.

The config → env plumbing is already correctly implemented and verified. Consider adding a Field description to document the profiling knob:

[optional_refactor_recommended]

 class QueryWorker(BaseModel):
     logging_level: LoggingLevel = "INFO"
-    enable_profiling: bool = False
+    enable_profiling: bool = Field(
+        False,
+        description="Enable pyinstrument runtime profiling for query worker tasks."
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class QueryWorker(BaseModel):
logging_level: LoggingLevel = "INFO"
enable_profiling: bool = False
class QueryWorker(BaseModel):
logging_level: LoggingLevel = "INFO"
enable_profiling: bool = Field(
False,
description="Enable pyinstrument runtime profiling for query worker tasks."
)
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 272 to 275,
add a Field description to the enable_profiling attribute to document what the
profiling knob does; replace the bare bool annotation with a pydantic Field for
enable_profiling that includes a concise description (e.g., when True, enables
runtime profiling for query workers and controls output location/verbosity as
applicable) so the generated docs and env config are clear.


F = TypeVar("F", bound=Callable[..., Any])

PROFILING_INTERVAL_SECONDS = 0.001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make sampling interval configurable via env.

Allows tuning overhead without code changes.

-PROFILING_INTERVAL_SECONDS = 0.001
+PROFILING_INTERVAL_SECONDS = float(os.getenv("CLP_PROFILING_INTERVAL_SECONDS", "0.001"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PROFILING_INTERVAL_SECONDS = 0.001
PROFILING_INTERVAL_SECONDS = float(os.getenv("CLP_PROFILING_INTERVAL_SECONDS", "0.001"))
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around line 27, the
hard-coded PROFILING_INTERVAL_SECONDS = 0.001 should be made configurable via an
environment variable; change it to read an env var (e.g.
CLP_PROFILING_INTERVAL_SECONDS) with a fallback default of 0.001, parse it as a
float, validate it is > 0 (or else use the default), and ensure os is imported;
update any related tests or docs to mention the new env var name and default.

Comment on lines +219 to +229
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
filename_parts = [section_name]

if job_id:
filename_parts.append(f"job{job_id}")
if task_id:
filename_parts.append(f"task{task_id}")

filename_parts.append(timestamp)
base_filename = "_".join(filename_parts)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer timezone-aware timestamps for filenames.

Ensures clarity across hosts and logs.

-        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+        timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")

Confirm pyinstrument 5.1.x exposes session.sample_count (line 217) as used here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
filename_parts = [section_name]
if job_id:
filename_parts.append(f"job{job_id}")
if task_id:
filename_parts.append(f"task{task_id}")
filename_parts.append(timestamp)
base_filename = "_".join(filename_parts)
timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")
filename_parts = [section_name]
if job_id:
filename_parts.append(f"job{job_id}")
if task_id:
filename_parts.append(f"task{task_id}")
filename_parts.append(timestamp)
base_filename = "_".join(filename_parts)
🧰 Tools
🪛 Ruff (0.14.1)

219-219: datetime.datetime.now() called without a tz argument

(DTZ005)

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 219 to
229, replace the naive datetime.now() filename timestamp with a timezone-aware
UTC timestamp (e.g.
datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")) so
filenames clearly reflect UTC across hosts; keep the rest of filename_parts
logic unchanged. Also verify that pyinstrument 5.1.x exposes
session.sample_count before using it (if it does not, read the session API and
use the appropriate property/method or guard access with hasattr and fallback).

Comment on lines +230 to +241
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)

# Save HTML with interactive visualization
html_path = output_dir / f"{base_filename}.html"
with open(html_path, "w", encoding="utf-8") as f:
f.write(profiler.output_html())

# Save human-readable text summary with call hierarchy
txt_path = output_dir / f"{base_filename}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
# Header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing CLP_LOGS_DIR (prevents crash).

Path(None) will raise when CLP_LOGS_DIR is unset. Provide a fallback and warn.

-        output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
-        output_dir.mkdir(exist_ok=True, parents=True)
+        logs_dir_env = os.getenv("CLP_LOGS_DIR")
+        base_dir = Path(logs_dir_env) if logs_dir_env else Path.cwd()
+        if not logs_dir_env:
+            logger.warning("CLP_LOGS_DIR not set; writing profiles under %s/profiles", base_dir)
+        output_dir = base_dir / "profiles"
+        output_dir.mkdir(exist_ok=True, parents=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)
# Save HTML with interactive visualization
html_path = output_dir / f"{base_filename}.html"
with open(html_path, "w", encoding="utf-8") as f:
f.write(profiler.output_html())
# Save human-readable text summary with call hierarchy
txt_path = output_dir / f"{base_filename}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
# Header
logs_dir_env = os.getenv("CLP_LOGS_DIR")
base_dir = Path(logs_dir_env) if logs_dir_env else Path.cwd()
if not logs_dir_env:
logger.warning("CLP_LOGS_DIR not set; writing profiles under %s/profiles", base_dir)
output_dir = base_dir / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)
# Save HTML with interactive visualization
html_path = output_dir / f"{base_filename}.html"
with open(html_path, "w", encoding="utf-8") as f:
f.write(profiler.output_html())
# Save human-readable text summary with call hierarchy
txt_path = output_dir / f"{base_filename}.txt"
with open(txt_path, "w", encoding="utf-8") as f:
# Header
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 230 to
241, the code assumes CLP_LOGS_DIR is set and calls
Path(os.getenv("CLP_LOGS_DIR")), which will raise if the env var is None; guard
by reading os.getenv("CLP_LOGS_DIR") into a variable, if it's falsy use a safe
fallback (e.g., tempfile.gettempdir() or Path.cwd()), log a warning using the
module logger or logging.warning that CLP_LOGS_DIR was unset and the fallback is
being used, then construct the Path from that fallback and continue to mkdir and
write files as before.

"mariadb>=1.0.11,<1.1.dev0",
"mysql-connector-python>=9.4.0",
"pydantic>=2.12.3",
"pyinstrument>=5.1.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pin pyinstrument to major and consider optionality.

To avoid unexpected breaks on future major releases, pin the upper bound. If you want to keep profiling optional, pair this with a lazy import (see profiling_utils note).

Apply:

-    "pyinstrument>=5.1.1",
+    "pyinstrument>=5.1.1,<6",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"pyinstrument>=5.1.1",
"pyinstrument>=5.1.1,<6",

Copy link
Contributor

@Eden-D-Zhang Eden-D-Zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick comment, otherwise looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants